Goto

Collaborating Authors

 Centre County


Learning Functional Graphs with Nonlinear Sufficient Dimension Reduction

Kim, Kyongwon, Li, Bing

arXiv.org Machine Learning

Functional graphical models have undergone extensive development during the recent years, leading to a variety models such as the functional Gaussian graphical model, the functional copula Gaussian graphical model, the functional Bayesian graphical model, the nonparametric functional additive graphical model, and the conditional functional graphical model. These models rely either on some parametric form of distributions on random functions, or on additive conditional independence, a criterion that is different from probabilistic conditional independence. In this paper we introduce a nonparametric functional graphical model based on functional sufficient dimension reduction. Our method not only relaxes the Gaussian or copula Gaussian assumptions, but also enhances estimation accuracy by avoiding the ``curse of dimensionality''. Moreover, it retains the probabilistic conditional independence as the criterion to determine the absence of edges. By doing simulation study and analysis of the f-MRI dataset, we demonstrate the advantages of our method.


MINES: Explainable Anomaly Detection through Web API Invariant Inference

Zhang, Wenjie, Lin, Yun, Kwok, Chun Fung Amos, Teoh, Xiwen, Xie, Xiaofei, Liauw, Frank, Zhang, Hongyu, Dong, Jin Song

arXiv.org Artificial Intelligence

Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing reliable web services. Many modern web applications operate on web APIs (e.g., RESTful, SOAP, and WebSockets), their exposure invites intended attacks or unintended illegal visits, causing abnormal system behaviors. However, such anomalies can share very similar logs with normal logs, missing crucial information (which could be in database) for log discrimination. Further, log instances can be also noisy, which can further mislead the state-of-the-art log learning solutions to learn spurious correlation, resulting superficial models and rules for anomaly detection. In this work, we propose MINES which infers explainable API invariants for anomaly detection from the schema level instead of detailed raw log instances, which can (1) significantly discriminate noise in logs to identify precise normalities and (2) detect abnormal behaviors beyond the instrumented logs. Technically, MINES (1) converts API signatures into table schema to enhance the original database shema; and (2) infers the potential database constraints on the enhanced database schema to capture the potential relationships between APIs and database tables. MINES uses LLM for extracting potential relationship based on two given table structures; and use normal log instances to reject and accept LLM-generated invariants. Finally, MINES translates the inferred constraints into invariants to generate Python code for verifying the runtime logs. We extensively evaluate MINES on web-tamper attacks on the benchmarks of TrainTicket, NiceFish, Gitea, Mastodon, and NextCloud against baselines such as LogRobust, LogFormer, and WebNorm. The results show that MINES achieves high recall for the anomalies while introducing almost zero false positives, indicating a new state-of-the-art.


LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving

Pan, Muyu, Walter, Matthew, Kodakandla, Dheeraj, Farooque, Mahfuza

arXiv.org Artificial Intelligence

Our work presents a novel reinforcement learning (RL) based framework to optimize heuristic selection within the conflict-driven clause learning (CDCL) process, improving the efficiency of Boolean satisfia-bility (SAT) solving. The proposed system, LangSAT, bridges the gap between natural language inputs and propositional logic by converting English descriptions into Conjunctive Normal Form (CNF) expressions and solving them using an RL-enhanced CDCL SAT solver. Unlike existing SAT-solving platforms that require CNF as input, LangSAT enables users to input standard English descriptions, making SAT-solving more accessible. The framework comprises two key components: Lang2Logic, which translates English sentences into CNF expressions, and SmartSAT, an RL-based SAT solver. SmartSAT encodes clause-variable relationships as structured graph representations and extracts global features specific to the SAT problem. This implementation provides the RL agent with deeper contextual information, enabling SAT problems to be solved more efficiently. Lang2Logic was evaluated on diverse natural language inputs, processing descriptions up to 450 words. The generated CNFs were solved by SmartSAT, which demonstrated comparable performance to traditional CDCL heuristics with respect to solving time. The combined LangSAT framework offers a more accessible and scalable solution for SAT-solving tasks across reasoning, formal verification, and debugging.


Data-Driven Predictive Modeling of Microfluidic Cancer Cell Separation Using a Deterministic Lateral Displacement Device

Chen, Elizabeth, Lee, Andrew, Sarowar, Tanbir, Chen, Xiaolin

arXiv.org Artificial Intelligence

Deterministic Lateral Displacement (DLD) devices are widely used in microfluidics for label-free, size-based separation of particles and cells, with particular promise in isolating circulating tumor cells (CTCs) for early cancer diagnostics. This study focuses on the optimization of DLD design parameters, such as row shift fraction, post size, and gap distance, to enhance the selective isolation of lung cancer cells based on their physical properties. To overcome the challenges of rare CTC detection and reduce reliance on computationally intensive simulations, machine learning models including gradient boosting, k-nearest neighbors, random forest, and multilayer perceptron (MLP) regressors are employed. Trained on a large, numerically validated dataset, these models predict particle trajectories and identify optimal device configurations, enabling high-throughput and cost-effective DLD design. Beyond trajectory prediction, the models aid in isolating critical design variables, offering a systematic, data-driven framework for automated DLD optimization. This integrative approach advances the development of scalable and precise microfluidic systems for cancer diagnostics, contributing to the broader goals of early detection and personalized medicine.


A Weak Penalty Neural ODE for Learning Chaotic Dynamics from Noisy Time Series

Li, Xuyang, Harlim, John, Chakraborty, Dibyajyoti, Maulik, Romit

arXiv.org Artificial Intelligence

Accurate forecasting of complex high-dimensional dynamical systems from observational data is essential for several applications across science and engineering. A key challenge, however, is that real-world measurements are often corrupted by noise, which severely degrades the performance of data-driven models. Particularly, in chaotic dynamical systems, where small errors amplify rapidly, it is challenging to identify a data-driven model from noisy data that achieves short-term accuracy while preserving long-term invariant properties. In this paper, we propose the use of the weak formulation as a complementary approach to the classical strong formulation of data-driven time-series forecasting models. Specifically, we focus on the neural ordinary differential equation (NODE) architecture. Unlike the standard strong formulation, which relies on the discretization of the NODE followed by optimization, the weak formulation constrains the model using a set of integrated residuals over temporal subdomains. While such a formulation yields an effective NODE model, we discover that the performance of a NODE can be further enhanced by employing this weak formulation as a penalty alongside the classical strong formulation-based learning. Through numerical demonstrations, we illustrate that our proposed training strategy, which we coined as the Weak-Penalty NODE (WP-NODE), achieves state-of-the-art forecasting accuracy and exceptional robustness across benchmark chaotic dynamical systems and real-world climate dataset.




Interfacial and bulk switching MoS2 memristors for an all-2D reservoir computing framework

Thool, Asmita S., Roy, Sourodeep, Barman, Prahalad Kanti, Biswas, Kartick, Nukala, Pavan, Misra, Abhishek, Das, Saptarshi, Chakrabarti, and Bhaswar

arXiv.org Artificial Intelligence

In this study, we design a reservoir computing (RC) network by exploiting short- and long-term memory dynamics in Au/Ti/MoS$_2$/Au memristive devices. The temporal dynamics is engineered by controlling the thickness of the Chemical Vapor Deposited (CVD) MoS$_2$ films. Devices with a monolayer (1L)-MoS$_2$ film exhibit volatile (short-term memory) switching dynamics. We also report non-volatile resistance switching with excellent uniformity and analog behavior in conductance tuning for the multilayer (ML) MoS$_2$ memristive devices. We correlate this performance with trap-assisted space-charge limited conduction (SCLC) mechanism, leading to a bulk-limited resistance switching behavior. Four-bit reservoir states are generated using volatile memristors. The readout layer is implemented with an array of nonvolatile synapses. This small RC network achieves 89.56\% precision in a spoken-digit recognition task and is also used to analyze a nonlinear time series equation.